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On the Structure of Boolean Functions with Small Spectral Norm 

Amir Shpilka* Avishay Tal^ Ben lee Volk* 



Abstract 



In this paper we prove results regarding Boolean functions with small spectral norm (the 
^D spectral norm of/ is ||/||i = ^^ |/(q!)|). Specifically, we prove the following results for functions 

/:{0,l}"^{0,l}with||/||i=A 

C^ 1. There is a subspace V of co-dimension at most A such that /|y is constant. 

<H 2. / can be computed by a parity decision tree of size 2 n^ . (a parity decision tree is a 

^»o decision tree whose nodes are labeled with arbitrary linear functions.) 

^^ 3. If in addition / has at most s nonzero Fourier coefRcients, then / can be computed by a 

,_i parity decision tree of depth A^ log s. 

\^ 4. For every < e there is a parity decision tree of depth 0{A^ + log(l/e)) and size 2'^^'^ ^ • 

\^ min{l/e^, 0(log(l/e))^"^} that e-approximates /. Furthermore, this tree can be learned, 

C/3 with probability 1 — 5, using poly(n, exp(A^), l/e,log{l/S)) membership queries. 



All the results above also hold (with a slight change in parameters) for functions / : Z" 
{0,1}. 
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1 Introduction 

The Fourier transform is one of the most useful tools in the analysis of Boolean functions. It 
is a household name in many areas of theoretical computer science: Learning theory (cf. [KM93| 
ILMN93[lMan94] ): Hardness of approximation (cf. [Hasnij ): Property testing (cf. |BLR931[BCH+96[ 
IGOS"*"!!] ): Social choice (cf. [ KKL881 IKal02j ) and more. The reader interested in the Fourier 
transform and its applications is referred to the online book jO'D12| . 

A common theme in the study of Fourier transform is the question of classifying all Boolean 
functions whose Fourier transforms share some natural property. For example, Friedgut proved that 
Boolean functions that have small influence are close to being juntas (i.e. functions that depend 
on a small number of coordinates) [Fri98j . Friedgut, Kalai and Naor proved that Boolean functions 
whose Fourier spectrum is concentrated on the first two levels are close to dictator functions (i.e. 
functions of the form f{xi, . . . ,Xn) = Xj or 1 — Xi). In [ZSlOj IMU09J it was conjectured that 
a Boolean function that has a sparse Fourier spectrum (i.e. that has only s nonzero Fourier 
coefficients), can be computed by a parity decision tree (for short we denote parity decision tree 
by ©-DT) of depth poly(logs). Recall that in a ©-DT nodes are labeled by linear functions (over 
Z2) rather than by variables. It is well known that a function that is computed by a depth d 



DT has sparsity at most exp(d) (see Lemma 2.5), so this conjecture implies a (more or less) 



tight result. This conjecture was raised in the context of the log-rank conjecture in communication 
complexity and, if true, it would imply that the log-rank conjecture is true for functions of the form 
F{x, y) = f{x © y), for some Boolean function /. 

In this paper we are interested in the structure of functions that have small spectral norm. 
Namely, in Boolean functions /:{0,1}"'— >-{0,l} that for some number A satisfy 



def 
1 — 



Y.\fia)\<A, (1) 



where A may depend on the number of variables n (for definitions see Section^. Such functions 
were studied in the context of circuit complexity (cf. |Gro97j ) and, more notably, in learning the- 
ory, where it is one of the most general family of Boolean functions that can be learned efficiently 
|KM931 [Man94| |ABF"'"08] . In particular, Kushilevitz and Mansour proved that any Boolean func- 
tion satisfying (II]), can be well approximated by a sparse polynomial |KM93j . This already gives 
some rough structure for functions with small spectral norm, however one may ask for a more re- 
fined structure that captures the function exactly. Green and Sanders were the first to obtain such 
a result (and until this work this was the only such result). They proved that if / satisfies Equa- 
tion (IT| then it can be expressed as a sum of at most 2^ characteristic functions of subspaces, 
that is, 

f= Y. ±1^^' (2) 

where each T^ is a subspace. Thus, when A is constant this gives a very strong result on the 
structure of such a function /. This result can be seen as an inverse theorem, as it is well known 
and easy to see that the spectral norm of the characteristic function of a subspace is constant. Thus, 
|GS08aj show that in general, any function with a small spectral norm is a linear combination of 
a (relatively) small number of such characteristic functions. Of course, ideally one would like to 
show that the number of functions in the sum is at most poly(yl) and not doubly exponential in A, 
however. Green and Sanders note that "it seems to us that it would be difficult to use our method 
to reduce the number of exponentials below two." 



It is possible that another classification of Boolean functions with small spectral norm could be 
achieved using decision trees, or more generally, parity decision trees. It is not hard to show that if 
a Boolean function g is computed by a ©-DT with s leaves then the spectral norm of g is at most s 



(see Lemma 2.5). Interestingly, we are not aware of any Boolean function that has a small spectral 
norm and that cannot be computed by a small ©-DT. It is thus an interesting question whether 
this is indeed the general case, namely, that any function of small spectral norm can be computed 
by a small ©-DT. We note that the result of |GS08a| does not yield such a structure. Indeed, if we 
were to represent the function given by Equation ([2]) as a ®-DT then, without knowing anything 
more about the function, then we do not see a more efficient representation than the brute-force 

one that yields a ®-DT of size n 

Another interesting question concerning functions with small spectral norm comes from the 
learning theory perspective. As mentioned above, Kushilevitz and Mansour proved that for any 
Boolean function satisfying Equation ([l]) there is some sparse polynomial g = ^i=i f{ai)xa^{x) 
(where the coefficients in the summation are the A^/e largest Fourier coefficient of /) such that 
Pixifix) 7^ sgn(g{x)] < e. Thus, their learning algorithm outputs as hypothesis the function 
sgn{g{x)). This is the case even if / is computed by a small decision tree or a small ©-DT. It would 
be desirable to output a hypothesis coming from the same complexity class as /, i.e. to output 
a decision tree or a 0-DT. However, a hardness result of |ABF"'"08| shows that under reasonable 
complexity assumptions, one cannot hope to output a small decision tree approximating /. So, a 
refinement of the question should be to try and output the smallest tree one can find for a function 
approximating /. For example, the function 

(^"'^ . \ 

sgn{g) = sgn I ^ f{ai)Xa,{^) I (3) 

can be computed by a ®-DT of depth 0{A?' /e) in the natural way. Even when yl is a constant and 
e is polynomially small this does not give much information. Thus, a natural question is to try and 
find a better representation for such a range of parameters. 

1.1 Our results 

Our first result identifies a local structure shared by Boolean functions with small spectral norm. 

Theorem 1.1. Let f : {0,1}" — ?• {0,1} be such that ||/||i = A, then, there is an affine subspace 
V C {0, 1}" of co-dimension at most A^ such that f is constant on V. 

We note that the proof of |GS08aj does not imply the existence of such an affine subspace V of 
such a high dimension. Our next result gives a ©-DT computing /. 

Theorem 1.2. Let f : {0, 1}" — > {0, 1} be such that \\f\\i = A, then, f can be computed by a (B-DT 
of size 2^ n^^ . 



In particular, the theorem implies that / = Ylii=i ^^ly, , where each Vi is a subspace. 



Another result settles the conjecture of jZS10| IMO09J for the case of sparse Boolean functions 
with small spectral norm. 

Theorem 1.3. Let f : {0, 1}" -> {0, 1} be such that ||/||i = A and \{a \ f{a) / 0}| = s. Then f 
can be computed by a (B-DT of depth A^ logs. 



Thus, if the spectral norm of / is constant (or poly(logs)), Theorem 1.3 settles the conjecture 



affirmatively. The conjecture is still open for the case where the spectral norm of / is large. 

Our last result (for functions over the Boolean cube) fits into the context of learning theory 
and provides a bound on the depth of a ©-DT approximating a function with a small spectral 
norm. Here, the distance between two Boolean functions is measured with respect to the uniform 
distribution, namely, dist(/,ff) = Pra;g{o,i}n[/(a;) / g{x)]. 

Theorem 1.4. Let f : {0, 1}" — )• {0, 1} be such that \\f\\i = A. Then for every 6,e > there is a 
randomized algorithm that, given a query oracle to f, outputs (with probability at least 1 — 6) a ©- 
DT of depth 0{A'^ +log(l/e)) and size 2'^^^ ) min{l/e^, 0(log(l/e))^"^}, which computes a Boolean 
function g^ such that dist{f,ge) < e. The algorithm runs in time polynomial in n, exp(^^), 1/e and 
log{l/5). 

Thus, when A is a constant and e is polynomially small, the depth is O(logn) and the size is only 
poly- logarithmic in n. This greatly improves upon the representation guaranteed by Equation ([3|. 
If one insists on outputting a ©-DT, then, for all ranges of parameters, the tree that we obtain is 
much smaller than the tree guaranteed by Equation ([3]). 

We also prove analogs of the theorems above for functions / : Z" — )• {+1,-1} having small 
spectral norm. Namely, in the theorems above one could instead talk of / : Z" — )• {0, 1} and 
obtain essentially the same resultsPl Theorems 4.7, 4.8, 4.10 and 4.11 are the Zp analogs to Theo- 



rems |l.l[|1.2 1.3 and 1.4, respectively. We note that in [GSOSl^ Green and Sanders extended their 



result to hold for functions mapping an abelian group G to {0, 1}, obtaining the same bound as in 
|GS08a] . so our result for functions on Z^ could be seen as an analog to their result for such groups. 

1.2 Comparison with [GSOSaJ 



Comparing Theorem 1.2 to Equation ([2]) (that was proved in |GS08aj ). we note that while Equa- 



tion ([2]) does not involve the number of variables (i.e. the upper bound on the number of subspaces 
only involves A), our result does involve n. On the other hand, we give a more refined structure - 
that of a parity decision tree - which is not implied by Equation ([2]) (see also the discussion above) . 
Moreover, when A = 0((log log n)^'^), our bound is much better than the one given in Equation Q. 

Our proof technique is also quite different than that of |GS08aj . Their proof idea is to represent 
/ as / = /i + /2 where the Fourier supports of /i and /2 are disjoint, and such that /i and /2 are 
close to being integer valued and have a somewhat smaller spectral norm. Then, using recursion, 
they represent each /, as a sum of a small number of characteristic functions of subspaces. In par- 
ticular, Green and Sanders do not restrict their treatment to Boolean functions but rather study 
functions that at every point of the Boolean cube obtain a value that is almost an integer. Thus, 
they prove a more general result, namely, that fz, the integer part of /, can be represented in the 
form of Equation ([2]). We on the other hand only work with Boolean functions, so their result is 
stronger from that respect. However, while their proof was a bit involved and required using results 
from additive combinatorics, our approach is more elementary and is based on exploiting the fact 
that / is Boolean. In particular, our starting point is an analysis of the simple equation /^ = 1 
(when we think of / as mapping {0,1}" to {±1}). Furthermore, we are able to use the fact that 
/ is Boolean in order to show that it can be computed by a small ©-DT, which does not seem to 

^Of course, one would have to speak about the analog of a ©-DT for the case where the inputs come from Zp. 



follow from |GS08aj . 

Green and Sanders later extended their technique and proved a similar result for functions over 
general abelian groups / : G — )• {0, 1} |GS08bj . Our technique do not extend to general groups, 
but we do obtain results for the case that G = Z" which again has the same advantages and 
disadvantages compared to the result of |GS08b| (although, the simplicity of our approach is even 
more evident here). 

1.3 Proof idea 

As mentioned above, our proof relies on the simple equation /^ = 1 (when we think of / : {0, 1}" — ?• 
{±1}). By expanding the Fourier representations (See Section [2] for definitions) of both sides we 
reach the identity 



that holds for sdl 5 ^ (See Lemma 3.2). This identity could be interpreted as saying that the 
mass on pairs whose product is positive is the same as the mass on pairs whose product is negative. 
In particular, if we consider the two heaviest elements in the Fourier spectrum, say, f{a) and /(/?), 
and let 6 = a + f3, then by restricting / to one of the subspaces xs{x) = 1 or xsi^) = —1, we get a 



substantial saving in the spectral norm (see Lemma 3.1). This happens since there is a significant 



Li mass on pairs /{"f), /(5 + 7) that have different signs. By repeating this process we manage to 
prove the existence of small ©-DT for /. 

The argument for functions over Z" is similar, but requires more technical work. For that reason 
we decided to give a separate proof for the case of functions over the Boolean cube, and then, after 
the ideas were laid out in their simpler form, to prove the results in the more general case. 



1.4 The work of Tsang et aL |TWXZ13| 



Independently and simultaneously to our work, Tsang et al. |TWXZ13] obtained related results. 
The main objective of the work |TWXZi3] was to study the communication complexity of sparse 
Boolean functions. These are functions / such that the communication matrix of the function 
F(x, y) = f{x(By) has low rank. Resolving the log-rank conjecture from communication complexity 
for such functions was the main motivation for the conjecture raised in |MO09| and [ZSlOj . 

Tsang et al. managed to prove a stronger version of our Theorem |1.1[ namely, they proved 
that / is constant on a subspace of co-dimension at most 0{A). Their argument is identical to 



ours (namely, to the one given in Lemma 3.1) except that they observe that after 0(1/ A) steps of 
increasing the largest Fourier coefficient of /, it grows to at least 1/2. From that point on they make 



use of the simple observation that the proof of (their equivalent of) Lemma 3.1 actually guarantees 



that the restriction that saves the most in the spectral norm keeps increasing the largest coefficient. 
Thus, now at each step the spectral norm goes down by some constant factor and hence additional 
0{1/A) many steps would make / constant rl 

This immediately improves the results in Theorems 1 1 . 1 1 and 1.3; we can now change the factor 
A^ to A in both. 

We also note that 



The work [TWXZfS] does not contain analogs for Theorems 1.2 and 1.4 



Tsang et al. did not study the case of functions from Z" to {0, 1}, and so they do not have analogs 
of Theorems |47j [^Sj [ilQ] and [4TT| 



Our Lemma 
from the proof. 



3.1 



only speaks about the spectral norm, but the effect on the largest Fourier coefficient is obvious 



1.5 Organization 

Section [2] contains the basic background and definitions. In Section |3] we prove our results for 
functions / : Zg — )• {+1, — !}• The results for functions on Z" are given in Section El Finally, in 
Section [5] we discuss problems left open by this work. 

2 Notation and Basic Results 

It will be more convenient for us to talk about functions / : {0, l}'^ — t- {±1}. Note that if 
/ : {0, 1}" — )- {0, 1} then 1 - 2/ : {0, 1}" -)- {±1} and 1-2/ and / have roughly the same spectral 
norm (up to a multiplicative factor of 2) and the same Fourier sparsity (up to ±1). 

2.1 Decision trees and parity decision trees 

In this section we define the basic computational models that we shall consider in the paper. 

Definition 2.1 (Decision tree). A decision tree is a labeled binary tree T. Each internal node of 
T is labeled with a variable Xi, and each leaf by a bit b £ {+1,-1}. Given an input x £ Zg, a 
computation over the tree is executed as follows: Starting at the root, stop if it's a leaf, and output 
its label. Otherwise, query its label Xj. If Xi = 0, then recursively evaluate the left subtree, and if 
Xi = 1, evaluate the right subtree. 

A decision tree T computes a function / if for every x G Z2, the computation of x over T 
outputs f{x). The depth of a decision tree is the maximal length of a path from the root to 
a leaf. The decision tree complexity of /, denoted D{f), is the depth of a minimal-depth tree 
computing /. Since one can always simply query all the variables of the input, it holds that for any 
Boolean function /, D{f) < n. A comprehensive survey of decision tree complexity can be found 
in |BdWn2j . 

In the context of Fourier analysis, even a function with simple Fourier spectrum, such as the 
parity function over n bits, which has only 1 nonzero Fourier coefficient, requires a full binary 
decision tree for its computation, and in particular its depth is n. This example suggests that a 
more suitable computational model for understanding the connection between the computational 
complexity and the Fourier expansion of a function is the parity decision tree model, first presented 
by Kushilevitz and Mansour ( |KM93] ). 

Definition 2.2 (©-DT). A parity decision tree is a labeled binary tree T, in which every internal 
node is labeled by a linear function a E Z2, and each leaf with a bit b £ {+1, — !}• Whenever a 
computation over an input x arrives at an internal node, it queries (a, x) (where the inner product 
is carried modulo 2). If {a,x) = it recursively evaluates the left subtree, and if {a,x) = 1, it 
evaluates the right subtree. When the computation reaches a leaf it outputs its label. 

Namely, a ©-DT can make an arbitrary linear query in every internal node (and in particular, 
compute the parity of n bits using a single query). Since a query of a single variable is linear, this 
model is an extension of the regular decision tree model. 

The depth of the minimal-depth parity decision tree which computes / is denoted D® (/), thus 
-D®(/) < D{f). As the example of the parity function shows, the parity decision tree model is 
strictly stronger than the model of decision trees. We also denote by size0(/) the size (i.e. number 
of leaves) of a minimal-size ©-DT computing /. 

As a helpful tool, we extend the parity decision tree model to a functional parity decision tree 
model, in which we allow every leaf to be labeled with a Boolean function, rather than only by a 



constant. A functional 0-DT T then computes a function / if for every leaf i of T, its label equals 
the restriction of / to the afiine subspace defined by the constraints that appear on the path from 
T's root to i. 

2.2 Fourier Transform 

We represent Boolean functions as functions / : Zg — )• {+1,-1} C M where —1 represents the 
Boolean value "True" and 1 represents the Boolean value "False". For a vector of n bits a, Ui 
denotes its i-th coordinate. The set of 2" group characters {xa '■ ^2 ~^ {+1; ~^} I ^ ^ ^2}) with 
Xa (x) = (— l)^«=i "'^' for every a G Zg , forms a basis of the vector space of functions from Zg into 
M. Furthermore, the basis is orthonormal with respect to the inner producl]^ 

{f,g) = ^[f{x)g{x)] 

X 

where the expectation is taken over the uniform distribution over Z2. The Fourier expansion of 
a function / : Z2 — )■ {+1,-1} is its unique representation as a linear combination of those group 
characters: 

fix) = J2 f(^)Xa{x). 

Two of the basic identities of Fourier analysis, which follow from the orthonormality of the basis, 
are: 

1- /(a) = {f,Xa) = Ex [f(.x)xa{x)] 

2. (Plancherel's Theorem) {f,g) = Ex [f{x)g{x)] = Xl^g^n f{a)g{a). 

The case f = g in Plancherel's theorem is called Parseval's Identity. Furthermore, when / is 
Boolean, /^ = 1, which implies 

We define two basic complexity measures for Boolean functions: 

Definition 2.3. Let / : Zg — > {+1, —1} be a Boolean function. The sparsity of f , denoted spai:{f) , 
is the number of non-zero Fourier coefficients, namely 

spar(/) = #{aGZ^|/(a)/0}. 

A function / is said to be s-sparse if spar(/) < s. 

Definition 2.4. Let / : Z2 — )■ {+1,-1} be a Boolean function. The Li norm (also dubbed the 
spectral norm^ of f is defined as 

ii/iii= Ei/(«)i- 

For every / : Z^ ^ {+1,-1} it holds that ||/||i > ||/||oo = 1 (where ||/||oo = maxxez? |/(x)|). 



We later show (Lemma 3.5) that equality is obtained if and only if / = ±Xa fo^^ some a G Zg. 

These measure are related to parity decision trees using the following simple lemma. For 
completeness we give the proof of the lemma in Appendix [A| 



3 



Later when we study of functions over Zp we define the inner product to be lEx \f{x)g{x)\ . 

6 



Lemma 2.5. Let / : Zg — )• {+1, —1} he a Boolean function computed by a ®-DT T of depth k and 
size m. Then: 

1. spar(/) < m2^ < 4^. 

2. Il/lli < m<2'=. 



In the upcoming sections we consider restrictions of Boolean functions to (affine) subspaces 
of 7^2 ■ We denote by /|y the restriction of / to a subspace y C Z2. For any a / 0, the set 
{x I Xa{x) = 1} is a subspace of Z2 of co-dimension 1. The restriction of / to this subspace 
is denoted /|^^=i. Similarly, the set {x \ Xai^) = —1} is an affine subspace of co-dimension 1, 
and we denote with /|^^=_i the restriction of / to this subspace. It can be shown (cf. jQ'D12] . 
Chapter 3, Section 3.3) that under such a restriction, the coefficients /(/3) and f{a + /?) (for every 
(3 G Z2) collapse to a single Fourier coefficient whose absolute value is |/(/3) + /(a + /3)|. Similarly, 
in the Fourier transform of /|^^=_i, they collapse to a single coefficient whose absolute value is 
|/(/3) — /(a + /3)|. This in particular implies that ||/||i and spar(/) do not increase when / is 
restricted to such a subspace. Indeed, both facts follow easily from the representation 

/(^)= E {fif3) + Kf3 + a)xa{x))x^{x), (5) 

where 7^2 /{a) denotes the cosets of the group (a) = {0,q} in Z2. When studying a restricted 
function, say /' = f\^^(^^-^^i, we shall abuse notation and denote with /'(/3) the term corresponding 
to the coset /3 + (a). Namely, f'{/3) = f{f3) + /(/3 + a), (similarly, for /" = /|^^(^)=_;^, we shall 

denote f"{j3) = f{/3) — f{j3 + a).) Thus, in /' both /'(/3) and /'(/? + a) refer to the same Fourier 
coefficient as we only consider coefficients modulo (a) (similarly for /"). 

3 Boolean functions with small spectral Norm 

In this section we prove our main results for functions over the Boolean cube. While many of the 
proofs and techniques used for general primes also apply to the case j» = 2, we find the case p = 2 
substantially simpler, so we present the proofs for this case separately. 

3.1 Basic tools 

In this section we prove the following lemma, which states that for every Boolean function / : Zg — )• 
{+1, —1}, with small spectral norm, there exists a linear function x-y such that both restrictions 
f\x-y=i s-^d f\-)^^=-i have noticeable smaller spectral norms compared to /. In Section|4|we give a 



generalization of the lemma for functions / : Z" — t- {+1,-1} (Lemma 4.1) 



Lemma 3.1 (Main Lemma for functions over Z2). Let / : Zg — s- {+1, —1} be a Boolean function. 
Let /(a) be f 's maximal Fourier coefficient in absolute value, and f{/3) be the second largest, and 
suppose f{/3) / 0. Let f = /|x„+a=i and f" = /|^^^g=_i. Then, if f{a)f{/3) > then it holds 
that 

||/'||i<||/||i-|/(a)| and ||/"||i < ||/||i - |/(/3)|. 

Iff{a)f{fi)<Othen 

||/'||i<||/||i- 1/(^)1 and ||/"||i<||/||i-|/(a)|. 



The proof of the lemma follows from analyzing the simple equation /^ = 1 . 
Lemma 3.2. Let / : Zg — )• {+1, —1} be a Boolean function. For all a ^ 0, it holds that 

7 

Proof. Since / is Boolean we have that /^ = 1. In the Fourier representation, 



E/w^7(^) E/(/5)^/^(^) =1- 



/3 

Then ^ /(7)/(ck + 7) is the Fourier coefficient /■^(a) of the function /^ at a. However, if a 7^ 
then this coefficient equals by the uniqueness of the Fourier expansion of the function /^ = 1. D 



Proof of Lemma 3.1. Without loss of generality assume that f{(y)f{P) > 0, i.e. they have the same 
sign (the other case is completely analogous.) By Lemma |3 .21 

(6) 



E/W/(" + /5 + 7) = 0. 



Let Na+i3 ^ Z2 be the set of vectors 7 such that f{'j)f{a + /3 + 7) < (Note that by assumption, 
a,/3 ^ Na+f^)- Switching sides in ([6]), we get: 

/(«)/(/?) = E |/(7)/(« + /3 + 7) - E |/(7)/(« + /3 + 7) 



"/&Nc+i3 






In particular, 



|/(a)||/(/3)|<J E |/(7)/(« + /3 + 7) 



-y&N^+p 



(7) 



We now use the fact that that /(/3) is the second largest in absolute value, and /(a) does not 
appear in the sum, to bound the right hand side: 

E |/(7)/(« + /3 + 7)|<l/(/3)l E min {1/(7)1, l/(« + /3 + 7)1} ■ (8) 

Then ([T]) and ([s]) (as well as the assumption |/(/3)| > 0) together imply 

|/(a)|<2 E min{|/(7)|,|/(a + /3 + 7)l}. (9) 

Let /' = f\xa+p=i- Then for every 7 the coefficients 7(7) and f{a + /3 + 7) collapse to a single 
coefficient whose absolute value is 1/(7) + /(a + /3 + 7)] (recall Equation ([s])). For 7 G N^+p, 



1/(7) + /(« + /3 + 7)1 = 1/(7)1 - l/(a + /3 + 7)1 



which reduces the Li norm of /' compared to that of / by at least mm(|/(7)|, |/(a + /3 + 7)!). In 
total, since both 7 and q + /3 + 7 belong to Na+/3, we get: 

||?||i<||/||i-2 E min {1/(7)1, |/(a + /3 + 7)1}. 

Therefore by ([9]) we have 

||?||i<||/||i-|/(a)|. 

When we consider /" = /lxa+/3=-i '^^ clearly have that for 7 = a, 

l?(7)l = 1/(7) -/(« + /? + 7)1 = l/(a)l - \m\. 

Hence, 

||/1|i<||/||i-|/(/3)|. 

D 

Next, we show that any Boolean function with small spectral norm has a large Fourier coefhcient. 

Lemma 3.3. Let / : Zg — )• {+1,-1} be a Boolean function. Denote A = \\f\\i, and let f{a) be f 's 
maximal Fourier coefficient in absolute value. Then |/(a)| > 1/A. Furthermore, let f{l3) be f's 
second largest Fourier coefficient in absolute value. T/ien |/(/3)| > (1— /(q)^)/||/||i = {l—f{a)'^)/A. 

Proof. By Parseval's identity. 
Now note that 

7 7 

which implies that indeed |/(a)| > 1/A. The second statement follows similarly, since 
1 - f{af = Y, f{^f < \fm Y 1/(7)1 < ll/lli • l/(/3)| = A\m\. 

D 

Corollary 3.4. Let / : Zg — >■ {+1, —1} be a Boolean function such that ||/||i = ^ > 1. Then there 
exists 7 G ^2 and b G {+1, —1} such that ||/|;^^=fc||i < A — l/A. 

Proof. The assumption A > 1 implies the second largest coefficient, /(/3), is non-zero, and then the 
result is immediate from Lemma l3.ll and Lemma |3.3[ D 



3.2 Proofs of Theorems 

We now show how Theorems |1.1|1.2|1.3| and 1.4 follow as simple consequences of Lemma 3.1 



Lemma 3.5. Let / : Zg — )• {+1, —1} be a Boolean function such that ||/||i = 1. Then f = itxc 
for some a G Zg. 



Proof. By Parseval's identity and the assumption, we get 

7 7 

For all 7 we have that |/(7)| G [0,1], so |/(7)| < /(7)^ unless |/(7)| = 1 or f{-f) = 0, and the 
proposition follows. D 



Corollary |3.4| and Lemma 3.5 imply Theorem |1.1[ 



Proof of Theorem \l.l\ Apply Corollary 3.4 iteratively on /. After less than AP' steps, we are left 



with a function g which is a restriction of / on an affine subspace defined by the restrictions so far. 



such that ll^lli = 1. By Lemma 3.5 g = ±x« for some a G Z2. If a 7^ we further restrict g on 



Xo = 1 to get a restriction of / which is constant. D 

We note that the proof of Theorem |1.1| actually implies that / is constant on a subspace of 
co-dimension at most ( 2 )■ ^^ mentioned earlier, a slight twist in the proof improves the co- 
dimension to 0{A) ITWXZ13) . 

Proof of Theorem \1-S\ Let 



L(n,A)= max sizem(f). 

ll/i|l<A 

We show, by induction on n, that L{n,A) < 2 • n . 

For n = 1 the result is trivial. 

Let n > 1 and further assume that A > 1 (if A = 1 then the claim follows from Lemma 3.5). 
Let f{a),f{(3) be the first and second largest Fourier coefficients in absolute value, respectively. 
By Lemma |3.3| we are in one of the following cases: 



1- !/(«)!> 1/2 

2. 1/2 > |/(a)| > 1/A and |/(/3)| > ^4^ > 3^. 

Consider the tree whose first query is the linear function x-y where 'j = a + (3 (i.e. we branch 
left or right according to the value of {x, 7)). By the choice of 7 we obtain the following recursion: 
In case 1, 

L{n, A) <L{n-l,A- 1/2) + L{n - 1, A); 

While in case 2, 

L{n, A) <L{n-l,A- 1/A) + L{n-1,A- 3/{AA)). 



3.3 



Note also that in the second case A >2, or else |/(a)| > 1/2 by Lemma 
the first case as 

L{n, A) <L{n-l,A- 1/2) + L{n - 1, A) 

< 2(^-1/2)^ . („ _ l)2{A-l/2) ^ 2^2 _ ^^ _ ^^2^ 

< 2^' • (n - 1)2(^-1/2) (1 + (n - 1)) 

< 2^' . n^^ . 



Induction follows in 
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In the second case we have 



L{n, A) <L{n-l,A- 1/A) + L{n-1,A- 3/(4^)) 



<2^'-n2^ 



where in the last inequahty we used the fact that A >2. D 

As the AND function demonstrates, this argument gives a result that is tight up to a polynomial 
factor in some cases. 



Proof of Theorem 1.3. By Theorem|l.l| there exist A^ linear functions ai, . . . , a^2 that can be fixed 



to values 61, ... , 5^2, respectively, where 6j G {+1, — 1} for 1 < z < A^, such that / restricted to the 
subspace {x \ Xenix) = bi , \/l < i < A^} is constant. This implies that for any non-zero coefficient 
/(/3) there exists at least one other non-zero coefficient /(/3 + 7) for 7 G spanjoi, . . . , a^2}. Indeed, 
if no such coefficient exists then the restriction /L^ (x)=bi,...,xa =b^ ^^^^ have the non-constant 
term /(/3) • X/3 (for example, this can be easily obtained from Equation ([5|). Therefore, for any 
other fixing of Xai, ■ ■ ■ iXa^2) both f{(3)xi3 and f{f3 + 7)^/3+7 collapse to the same (perhaps non- 
zero) linear function, which implies that spar(/L^ ^^/^ ,^^ ^^/ ) < spar(/)/2 for any choice of 

6'^, . . . , 6^2 ■ In other words, if we consider the tree of depth A^ in which on level i all nodes branch 
according to {ai,x) then restricting / to any path yields a new function with half the sparsity. 
Thus, we can continue this process by induction for at most logs steps, until all the functions in 
the leaves are constant. The resulting tree has depth at most A^ logs as claimed. D 



Our next goal is proving Theorem 1.4 To this end, we use a lemma which shows there exists 
a low depth functional ®-DT which computes a function g such that Fixifix) ^ gix)] < e, where 
X is drawn from the uniform distribution over Z2. Recall that the bias of a Boolean function / is 
defined to be 

bias(/) =' Pr[/(x) = 1] - Pr[/(x) = -1] 

X X 

Alternatively, bias(/) = |/(0)|. 

Lemma 3.6. Let / : Z2 — >■ { + 1,-1} be a Boolean function with ||/||i < A. Then, there exists a 
functional (B-DT of depth at most OiA^ +log{l/e)) that computes a function g such thatPix[f{x) 7^ 
9{x)] < e. Furthermore, the size of the tree is at most 2^^ 'min{l/e2,0(log(l/e))2 }. 

Proof. Let K = max 1 10^42, 2 log(l/e) } be a bound on the depth of the tree. In order to construct 
the functional decision tree, we use a recursive argument that stops whenever we reach a constant 
leaf, or after K levels of recursion, and then show that for a uniformly random x G Zj, x arrives 
at a highly biased leaf with probability > 1 — e, hence proving the statement of the lemma. 

Let /(a) be /'s largest coefficient in absolute value, and /(/3) the second largest. Note that if 
1/(0)1 > 1 — e we are done. Hence, we consider two cases: 

1. \f{a)\ > 1-efor a/0: 

We first show that if |/(a)| > 1 — e then |/(0)| < e. By considering — / instead of /, if needed, 
we may assume without the loss of generality /(a) > 1 — e. Note that 

1 - e < /(a) = Pr[/ = Xa] - Pr[/ / Xa] = (1 - Pr[/ / Xa]) - Pr[/ / x«], 

11 



so Pr[/ 7^ Xa\ < e/2- Now, since E[Xa] = 0, we have 

1/(0)1 = I E[/]| = I E[/] - nXa]\ = I E[/ - Xa]\ < n\f - Xa\] = 2Pr[/ / Xa] < e. 

In this case we query on Xa- Note that no matter what value Xa obtains, the restricted 
function has bias at least |/(a)| — |/(0)| > 1 — 2e, and we terminate the recursion. 



2. |/(a)|<l-e: 



3.1 



for at 



In this case we query on Xa+i3- Let /' = f\^^^^=i and /" = /|^^_^^=_i By Lemma 
least one of /' and /", the spectral norm drops by at least 1/A. We continue by induction the 
construction on /' and /", terminating when all the leaves are highly biased (in particular 
this includes the case of a constant leaf), or after at most K levels of recursion. 

It remains to be shown that the fraction of inputs x G Z'g that arrive at an unbiased leaf is at 
most e. We say an internal node labeled x^y is norm-reducing for x, if X'ji^) = b and the restriction 
on X'y = b reduces the spectral norm by at least 1/A. Clearly, a computation over any input x which 
traverses A"^ norm reducing nodes for x arrives at a constant leaf. Furthermore, by construction, all 
the leaves which are not highly biased appear in the K-th. level of the tree. Hence, an input which 
arrives at an unbiased node satisfies K independent linear equations, for which at most A^ are 
norm reducing. Since for every fixed 7^ 7 G Z2 and b E {+1, —1} the probability that x-yi^) = b 
is exactly 1/2, the probability that x arrives at a non highly biased node is bounded bjjj 



i=0 

by the choice of K. 



^y.<2--A^(^\<2i--/^^<e 



To prove the upper bound on the size of the tree we first note that 2 is a trivial upper bound. 



Moreover, as in the proof of Theorem 1.2, the construction that we have satisfies the recursion 
formula 

S{K -d,B)< max{5(i^ - (d + 1), B - 1/2) + S{K - {d + 1), B), 

S{K -{d+l),B- 1/B) + SiK -{d+l),B- 3/(45))} , 

where S{K — d, B) stands for the number of leaves in the tree rooted at a node v at depth d such 
that the function /„ computed at v satisfies ||/,;||i < B. As before, the solution to this recursion is 
S{K,A) < 2 K . Overall, we have that the size of the tree the approximating parity decision is 
at most: 

min {2^, 2^'ir2A| ^ ^-^ |j^^^ {2^0^', e-2| , 2^' • max {(2 log(l/e))2^, {lOA^f^} } 

< min {210^' • e-2,2^' • (10^^)2^ . (21og(l/e))2^} 

<2«(^')-min{e-2,0(log(l/e))2^} 

as claimed. 

D 

*We count how many words in {0, 1}^ with fewer than A^ I's are there. 
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Note that if we replace each highly biased function-labeled leaf in the functional 0-DT from 



Lemma 3.6 with the constant it is biased towards (i.e. by the sign of its constant term), the total 
error would increase by at most e. That is, it can be easily converted to a regular ©-DT of a 



function g which e- approximates /. In fact, in the proof of Lemma 3.6 we could have continued 
the recursion until reaching a constant leaf or depth K, but for the sake of understanding the proof 



of Theorem 1.4 it may be more clear to keep the current version in mind. 



The proof of Theorem 1.4 follows by combining Lemma 3.6 with the well known result of 
Goldreich and Levin |GL89j and of Kushilevitz and Mansour [ KM93| . who showed that given a 
query oracle to a function /, with high probability, one can approximate its large Fourier coefficients 
in polynomial time. 



Lemma 3.7 ( |GL89tlKM93| ). There exists a randomized algorithm, such that given a query oracle 
to a function / : Zg — >• {+1, —1}, CL'^-d parameters S,6,r], outputs, with probability at least 1 — 6, 
a list containing all of f 's Fourier coefficients whose absolute value is at least 9. Furthermore, 
the algorithm outputs an additive approximation of at most rj to each of these coefficients. The 
algorithm runs in polynomial time in n, 1/9, 1/r] and \og{l/5). 



Proof of Theorem I.4 We use the algorithm from Lemma |3.7| to fin d /'s largest Fourier coeffi- 
cient in absolute value, /(a). Whenever |/(a)| < 1 — e. Lemma 3.3 implies |/(/3)| > ~ >," 



> 



— \^' > e/A, so the same algorithm can be used to find the second largest coefficient, /(/3), in 
time poly(n, A, 1/e, log{l/5)). We use Lemma 3.6 to construct a functional ©-DT, and replace every 
function-labeled leaf with the constant it's biased towards. The bound on the running time follows 
from the size of the 0-DT and the running time of the algorithm from Lemma |3.7[ 

In fact, there is a slight inaccuracy in the argument above. Note that Lemma [3 . 7| only guarantees 
that we find a coefficient that is approximately the largest one. However, if it is the case that the 



second largest coefficient is very close to the largest one, then in Lemma 3.6 when we branch 
according to Xa+i^ both children have significantly smaller spectral norm. 

If it is the case that we correctly identified the largest Fourier coefficient but failed to identify 
the second largest then we note that if our approximation is good enough, say better than e/2A, 



then even if we are mistaken and branch according to Xa+i3' where 



|/(/3')| <e/2Athe 
the argument in Lemma 3.6 still works, perhaps with a slightly worse constant in the big O. D 



l/(/3)l 



4 Functions over Z!J with small spectral norm 

In this section, we extend our results to functions / : Z" — t- {-|-1,— 1} where p is any fixed prime. 
Throughout this section we assume p > 2. We start by giving some basic facts on the Fourier 
transform over Z" 

4.1 Preliminaries 

2TTi 

Let oj = e p sCbea primitive root of unity of order p. The set of p^ group characters 

{X« : Z^ ^ C I a G Z"} 

where Xa{x) = uj^"'^' , is a basis for the vector space of functions from Z" into C, and is orthonormal 
with respect to the inner product {f,g) = lEix[f{x)g{x)]]^ We now have that f{a) = lEix[f{x)Xa{x)] 

^For a complex number z, we denote by z its complex conjugate. 
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and / = X^Qg^n f{a)xa- Plancherel's theorem holds here as well and the sparsity and Li norm are 
defined in the same way as they were defined for functions / : Z2 — >■ {+1,-1}. Lemma |3.3| also 
extends to functions / : Z" — )• {+1, —1}, with virtually the same proof. When / is real-valued (and 
in particular, a Boolean function), then /(O) = 1E[/] is real, and it can also be directly verified that 

/(a) = /(-«)• 

We have the analog to Equation ([5]): 

/(^)= E (E/"('^ + ^-«)(^-(^))')^/3(2;). (10) 

l^/{a) \k=0 J 



Hence, when / is restricted to an affine subspace on which Xa = ^^ (where < A < p — 1), then 
for ever}n/3 G Zp/(a) we have 

p-i 

k=0 

For every /3 G Z^, we denote by [/3]a = /? + (a) the coset of (a) in which /3 resides. 
Lemma I3. 21 now becomes: 

J2 /(«)/(/3-a) = (11) 

for ah / /? G Z^. 

As a generalization of the ©-DT model, we define a p-ary linear decision tree, denoted (Bp- 
DT, to be a computation tree where every internal node v is labeled by a linear function 7 G Z" 
and has p children. The edges between v and its children are labeled 0,l,...,p — 1, and on an 
input X, it computes (7, x) mod p and branches accordingly. We carry along from the binary 
case the notation D®p{f) and size9p(/), and define them to be the depth (respectively, size) of a 
minimal-depth (resp. size) ©p-DT computing /. 

4.2 Basic tools 

In this section we prove the basic tools required for generalizing the theorems for functions defined 



on Z2 to functions /iZ"— t-J+I,— 1}. As a generalization of Lemma 3.1 , we show a slightly more 
complex and detailed argument: 

Lemma 4.1 (Main Lemma for functions over Z"). Let f : THl ^t {+1,-1} he a non-constant 
Boolean function such that \\f\\i = A. Let f{a) be its largest coefficient in absolute value, and f{j3) 
be the second largest. Then there exist a universal constant cq and a constant ci = ci{p) = 0{l/p'^) 
such that 

1. ForallXeZp, ||/|;^,.||i < A - co|/(/3)|. 

2. There exists at least m := [p/3\ distinct elements Ai, . . . , Am G Zp such that ||/| ^^x,. \\i < 
A-co\f{a)\ <A-co/Afork = l,...,m. 

3. There exists at least p — 1 distinct elements Ai,...,Ap_i G Zp such that ||/| =^a^, ||i < 
A-ci- \f{a)\ <A- ci/A fork = l,...,p-l. 



^Recall that (a) is the additive group generated by a and Zp/{a) is the set of cosets of (a). 
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As before we first prove a claim characterizing functions with very smaU spectral norm. Observe 
that when p > 2, the characters themselves are not Boolean functions any more. The following is 



a variant of Lemma 3.5 for Z" with p > 2. 

Lemma 4.2. Let / : Z" — t- {+1, —1} be a Boolean function such that 

Proof. Once more, using Parseval's identity and the assumption: 



1. Then f = ±1. 



Ei/wi' = i = Ei/(^)i- 



As before, |/(7)| € [0, 1], which implies |/(a)| = 1 for exactly one a G Z^, i.e. f = z ■ Xa where 
z £ C and \z\ = 1. Since / is Boolean and /(O) = z, we get z = ±1, and ±Xa is Boolean (when 
p > 2) only when a = 0. D 

The following is a purely geometric lemma we use in our analysis. Since the Fourier coefficients 
now are complex numbers we need to bound the decrease in the spectral norm when two coefficients 
that are not aligned in the same direction collapse to the same coefficient. 

Lemma 4.3. Let zi,Z2 G C such that \zi\ = R, \z2\ = r and r < R. Suppose the angle between z\ 
and Z2 is 6. Then, for C = C{9) = (1 — cos{9))/2 it holds that 

\zi\ + \z2\ - \zi + Z2\ > Cr. 

We give the simple proof in Appendix [B] 

The next lemma is similar to the inequalities of the type we used in the proof of Lemma |3.1[ 

Lemma 4.4. Let / : Z" — t- {+1, —1} be a non-constant Boolean function, and suppose /(O) is the 
largest Fourier coefficient in absolute value and f{f3) is the second largest. Then 



2|/(0)| < E mii^ 

7GZJf 
77^0,/3 



1/(7)1,1/(7-/3)1} 



Proof. By rearranging Equation (11) with respect to /3, we get: 



|2/(0)/(/3)| 



E /(^)/(/3 - 7) 



77^0,/3 



Now apply the triangle inequality to the right hand side, and then utilize the fact that /(/3) is the 
second largest in absolute value and /(O) does not appear in the right hand side, to obtain 

2|/(0)||/(/3)| < |/(/3)| E min{|/(7)|,|/(/3-7)l}. 

7ez^' 

77^0,/3 



Since / is real-valued, /(/3 — 7) = /(7 — §) (a nd in particular, they have the same absolute value), 
and since / is non-constant, by Lemma 
desired inequality. 



4.2 



we have 



1 > 1, i.e. /(/3) 7^ 0, which implies the 

D 
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When analyzing the loss in the Li norm which is caused by restriction on x»?, it will be convenient 
to sum over the individual losses on pairs /(7),/(7 + f?) that collapse to the same coefficient. 
However, letting 7 run over all of Z" these pairs are not pairwise disjoint, so we might over-count 
the losses. The following lemma generously accounts for such over-counting issues, by showing that 
summing over all (not pairwise disjoint) pairs differs from the true counting by at most a constant 
factor. 

Lemma 4.5. Let / : Z^ ^ {+1, -1}, / r/ G ZJJ, and X e Zp. // 

E 1/(^)1 + 1/(^ + 7)1 - 1/(7) + t^"/(^ + 7)1 = B, 



then 






p-1 
fc=0 



U) 



Xk 



> B/3. 



Note that the left hand side of the last inequality is exactly the loss in the Li norm when 
restricting f on Xri = ^^- We defer the proof of Lemma 4.5 to Appendix O 



Lemma 4.6. Let / : Z" — )■ {+1, —1} be a non-constant Boolean function such that ||/||i = A. Let 
/(a) be its largest coefficient in absolute value, and /(/?) be the second largest. Suppose A G Zp is 
such that the angle between f{a) and uj^f{(3) in absolute value is at most 7r/3. Then there is a 



universal constant c > such that 



dei, 



IX/3-a=<^^ 



\i<A-c\f{a)\<A-c/A. 



Proof. Denote r/='/3 — a. Under the assumption of the lemma, noting that[^ f{oi) ' ^'^ f{P) 
f{a)-u-^f{-fi) 



^t0 he 



Re [f{a)-oo-^f{-P)] > cos(7r/3) • |/(a) • a;-V^(-/3)l = cos(7r/3)|/(a)||/(/3)|. 



(12) 



By equation (11), with respect to —rj / 0, we have 



Hence 



where cq = 1 if a 
sides gives 



CO • /(«)/(-/3) = - E /(7)/(-^-7), 
-/3 and cq = 2 otherwise. Multiplying by a;" and taking the real part of both 



,-A. 



Re(co-oj-^f{a)f{-{3))= ^ -Re (f{j)oj-^f{-r^ - j) 



(13) 



Let A''^ = "^ 7 I Re ( f{'~^)oj ^f{—ri — 7) ) < 0, 7 7^ a, — /? >. Then ( 13 ), as well as the fact that cq G 



{1,2} and the left hand side is positive (by (|12[)), imply 



Re{oj-^fia)fi-f3)) < ^ _ Re (/(7)a;- VV^/ - 7) 

76Af,, 



(14) 



^Re{z) is the real part of a complex number z and z is its conjugate. 
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Note that for every pair (/(7), OJ ^/{—rj — 7)), where 7 G N^^, the angle between f{'-f) and 
uj^-^f{—r] — ^) (= uj^ f [t] + -y)) is in the range [7r/2,37r/2]. Furthermore, when applying the re- 



striction ^77 = w each such pair collapses to the same coefficient, and since the angle between the 



two coefficients (in absolute value) is at least '7r/2, by Lemma 4.3 it follows that for all 7 G N^j 



1/(7)1 + Ifir] + 7)1 - 1/(7) + ^'/(r/ + 7)1 > min{|/(7)|, |/(r? + 7)!} • 0(^/2). (15) 



(where C{9) = (1 — cos(0))/2 is as defined in Lemma |4.3| ). Using (15) to bound the loss on 
every coefficient, and summing over all 7 G Z" (while bearing in mind that every summand is 
non-negative), we have 

Ei/wi + 1/(^ + ^)i - i/w + '^'/(^ + ^)i 

> E 1/(7)1 + l/(^ + 7)1- 1/(7) + ^"/(r? + 7)1 

> 5^min{|/(7)|,|/(r? + 7)|}-C(vr/2) 

Since neither a nor — /3 appear in Nrj, and /(/3) is the second largest coefficient, for all 7 G A^,j it 
holds that 

l/(7)ll/(r? + 7)| 



min{|/(7)|, 1/(7? + 7)1} > 



\fm 



so 



E 1/(^)1 + l/(^ + ^)|-|/W+^'/(^ + ^)l^^-^(^/2)- E 1/(7)11/(^ + 7)1. (16) 

7 I/(PJ| 76Af^ 

Taking the complex conjugate and multiplying by w"^, it is also clear that for all 7 

l/(7)ll/(r/ + 7)| = |/(7)||u;-V(-r?-7)| = |/(7)^-V(-^-7)l > -Re (/(7)^-V(-^ - 7)) • 



Hence (16) and (14) imply 



E 1/(7)1 + 1/(^ + 7)1- 1/(7) + ^'/(^ + 7)1 >S^. E - Re (/(7)^- V(-^ - 7)) 

7 \j{P)\ ^^Nr, 

>^^ReL''f{a)fi-/3)). 



And (12) now gives 



E l/h)l + l/C + ^)l - l/W + -V(, + 7)1 > gW2)-cosW3).|/M|.|/W| ^ ^^i^j^ji 

7 \fip}\ 



where c is an absolute constant. By Lemma 4.5 the Li norm of the restricted function has decreased 



by at least c|/(a)|/3. D 

We are now ready to prove the main lemma for functions over Z" 
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Proof of Lemma 4-1- Let r/ = /3 — a as in Lemma 4.6 Let A G Zp, and consider the restriction 
Xri = w'^. Let be the angle between /(a) and io^f{a + rj) = oj^f{f3). If is larger in absolute 
value than vr/S, then under th e res triction the coefficients /(a) and /(/?) collapse into the same 



coefficient , res ulting by Lemma 4.3 in a C(7 r/3) • |/(/3)| loss in the Li norm (where C(-) is as stated 
in Lemma 4.3). If ^ < 7r/3 then Lemma 4.6 implies a loss of co|/(a)| (which is also at least co|/(/3)|) 
in the Li norm where cq is an absolute constant. This completes Item [l] in the proof. 

Furthermore, since multiplication by uj rotates /(/3) by 27r/p, there exists at least [p/3j values 
for rj G 'Zp such that \6\ would be at most 7r/3, which completes Item [2] in the proof. 



Next, we prove Itemjs} Let C = C{tt/p) = (1 — cos(7r/p))/2 = 0(l/p^) as in Lemma 4.3 We 



distinguish between two cases: The first case we consider is q 7^ 0. In this case, by the fact the / 
is real-valued |/(— a)| = |/(a;)|. So /3 = —a and by Itemol restricting on Xrj = w"^, for any A G Zp, 
yields 

|i<||/||i-co-|/(a)|, (17) 



ix^= 



which implies Item |3] for this case. 

The second case is that the largest Fourier coefficient in absolute value is achieved on a = 0. 
In this case /3 / 0, and rj = j3. By the assumption ||/||i > 1, we have |/(/3)| > 0. We define the 

weight of a pair {7,7 — /3} C Z^ to be w{'^) = min < 1/(7) |, |/(7 — /3)| k and denote 



Thus By Lemma 4.4, we have 

2|/(0)| < W. (18) 

Note that when restricting / on X/3 = ^'^^ f{l) and /(7 — /3) collapse to the same coefficient. The 
new coefficient becomes 1/(7) + uJ'^fil + /3) + • • • + oj^^'P'^^ f{-f + {p — 1)/3)|- We analyze only the 
loss in the Li norm obtained from the collapse of /(7) and /(7 + {p— l)/3) = /(7 — /?) to the same 
coefficient. Let 6 be the angle between /(7) and 7(7 — /3). Since multiplication by u is equivalent 
to rotation by 2Tr/p, as A traverses over 0, 1, ...,p — 1, the angle between 7(7) and u!^^p~^' f{'y — /3) 
attains all possible values 6 + 2k'k/p for k = 0, 1, ...,p — 1. Hence, there exists at most one choice 
of A such that the angle between 7(7) and lo^^p^^' f{'y — /3) is less than n/p. We call A G Zp good 
with respect to 7 if the angle between 7(7) and uj'^^p^^' f{'-^ — /3) is at least vr/p. If we fix /3, then 
for every pair there exist at least p — 1 good elements in Zp. Intuitively, each element A which is 
good guarantees a loss of at least C ■ min{|7(7)|, |7(7 ~ P)\} = C'w{'y) in the spectral norm (the 
actual analysis, which will now follow, is a bit more delicate). 

Consider now the matrix M whose rows are indexed by elements 7 G Z" for all 7 7^ 0, /3, and 



p 



whose columns are indexed by all elements A G Zp. We define: 

{w{'y) if A is good with respect to 7 
otherwise 

Since for every 7 there are at least p — 1 good elements, we have 

Y^ M^,A >{p-l)Yl ^(7) = (P - 1)^- (19) 

7,A 77^0,/3 

While for every fixed column Aq, 

^M^,Ao<VF. (20) 
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As there are p columns, (|19|) and (20) together imply that there is at most one column in which 

(21) 



the total weight is less than W/2, i.e. for all A G Zp but at most one, it holds that 

Y,M-y,x>W/2. 



Every element A G Zp which satisfies (21) will be called good. We thus proved the existence of at 

least p — 1 good elements A. 

the loss of 



We now fix a good element A and consider the restriction Xf3 = ^^- By Lemma 
the spectral norm under this restriction is at least 



4.5 



1/3 • E 1/(^)1 + l/(^ - /3)l - 1/(7) + ^'^""'^"(7 - /3)l, 



which is, by Lemma 4.3 and the definition of Mxy, at least 



1/3- Yl C •min{|/(7)|, 1/(7 -/3)|} = 1/3 •j;C-M^,A>t7.Ty/6> 1/(0)1 -CA 

7:A is good w.r.t. 7 7 



where we used ( 21 ) and ( 18 ) for the penultimate and last inequalities, respectively. Letting ci = C/3 
completes the proof of the lemma. D 



4.3 Analogs of Theorems 1.1, 1.2, 1.3 and 1.4 



Theorems |4.7[ |4.8[ 4.10 and 4.11 now follow as consequences of Lemma 4.1 Their proofs use the 



same arguments we used to deduce their Z? counterparts from Lemma 3.1 We use the notation 



Op{-) when the underlying constant depends on p, whereas when we use O(-), the underlying 
constant is some absolute constant. 

Theorem 4.7. Let f : Zp ^- {+1, —1} be a Boolean function with ||/||i = A. Then there exists an 
affine subspace V ^TIl of co-dimension at most 0{A'^) such that f is constant on V. 



Proof. Apply Lemma 4.1 iteratively on /. By assumption p > 2, so [p/3j > 1, and then using 
Item ^ in the proof, after at most A'^/cq steps, we are left with a function g which is a restriction 
of / on an affine subspace defined by the restrictions so far, such that H^Hi = 1. By Lemma 4.2 
Q = ±1. 



D 



Theorem 4.8. Let / : Z" — )• {+1, —1} be a Boolean function with 



A. Then size0p(/) < 



p 



Op{A^)^Op{A) 



Proof. By Lemma 4.1 , there is a constant < c < 1 (where c := minjco, ci} depends only on p), a 



linear function 7 G Z" and Ai, ..., Ap_i G Zp such that 

Furthermore, for the p-th direction Ap, the same lemma shows that 
/(/3) is the second largest coefficient. 



^^=^A, 111 < A-c\f{a)\ for ah 1 < j < p-1. 

|~A,||i<^-c|/(/3)| where 



As before, let 



L{n,A) 



def 



max size© /. 
5^{+i,-i} 

ll/lli<A 



We show, by induction on n, that L{n,A) < p ''^v? I'^. For n = 1 the result is trivial. Let n > 1 



and further assume that A> \ (if j4 = 1 then the claim follows from Lemma 4.2). 
As in the proof of Theorem |1. 2 [ we consider two cases: 
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1- l/(«)l > 1/2 

2. 1/2 > I/HI > \IA and |/(/3)| > i^T^ > 33- 

We again Consider the tree whose first query is the hnear function Xi- In the first case, by the 
choice of 7 we obtain the following recursion: 

L(n, A)<(j)- l)L(n -\,A- c/2) + L{n - 1, A). 

The induction hypothesis then implies (using the assumption that A> V) 

Lin, A)<{p- l)p2(^-'=/2)Vc(^ _ l)2(A-c/2)/c ^ ^2AVc(„ _ ^^2A/c 
<ip- iy(^Vc)-l(^ _ i)2A/c-l +p2AVc(^ _ i)2A/c 
< /^Vc(^ _ l)2A/c-l (! + („_ 1)) 

While in the second case, we have the recurrence 

L{n, A) <{p- l)L{n -1,A- c/A) + L{n-1,A- 3c/(4A)) <p-L{n-l,A- 3c/ (4^)) 
Again, the induction hypothesis implies (using the assumption that ^ > 1) 

Hn, A) <p- p2(A-3c/(4A))2/c(-^ _ ^^2{A-{3c/{4A)))/c 
^ ^2A/c . 1+2/12 /c-3+18c/(16A2) 

D 

As an immediate corollary, we get: 

Corollary 4.9. Let / : Z" — ;■ {+1,-1} be a Boolean function with \\f\\i = A. Then f = 

Yji=i ^tlvj, where each Vi is an affine subspace ofZ'^. 

Theorem 4.10. Let / : Z^ ^ {+1, -1} be such that ||/||i = A and \{a \ f{a) / 0}| = s. Then f 
can be computed by a Qp-DT of depth O^A^ logs). 



Proof. By Theorem 4.7, there exist K = 0(^^) linear functions ai, . . . , ax which can be fixed to 
values w'^i , • • • , (^^^ where Xj G Zp for I < j < K, such that / restricted to the subspace {x \ 
Xa{x) = uj^ , VI < j < K} is constant. Once again, this implies that for any non-zero coefficient 
/(/3) there exists at least one other non-zero coefficient /(/3+7) for 7 G spanjai, . . . , ax}, since if no 
such coefficient exists then the restriction fl , ,-^\_, ,ai ,, _, ,a,, will have the non-constant term 

/(/3) • X/3- Therefore, for any other fixing of Xai, • • • , Xa^ > both f{f3)xi3 and /(/3 + 7)X/3+7 collapse 
to the same (perhaps non-zero) linear function, which implies that spar(/| ^/ a' ) ^ 

spar(/)/2 for any choice of A'^^, . . . , A'^. Thus, we can continue by induction until all the functions 
in the leaves are constant. D 

Theorem 4.11. Let f -.11^ ^ {+1,-1} be such that \\f\\i = A. Then for every (5, e > there is 
a randomized algorithm that given a query oracle to f outputs (with probability at least 1 — 6) a 
®p-DT of depth 0(^2^1og(l/e)) and size min{pO(^'+'°g(i/^)),pOp(^') • 0(^2 + log(l/e))^p(^)} that 
computes a Boolean function g^ such that dist{f,g^) < e. The algorithm runs in polynomial time 
in n, exp(A2), 1/e and \og{l/ 5). 
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The proof of Theorem 4.11 follows the same outhne as the proof of Theorem 1.4 A functional 
Q)p-DT is defined as a ©p-DT where we aUow every leaf to be labeled by a Boolean function on Z" , 
and the bias of a function / : Z" — )• {+1,-1} is defined as in the binary case. We again show there 
exists a low depth © — DT which computes a function g such that PiCx[f{x) / g{x)] < e (where x 
is drawn from the uniform distribution over Z" 



Lemma 4.12. Let / : Z2 — s- {+1, —1} be a Boolean function with ||/||i < A. Then there exists a 
functional ®-DT of depth at most 0{A^ + log(l/e)) and size min{p<^(^'+l°s(i/'^)),p<^p(^'') • 0{A^ + 
log(l/e)) ''^^}, computing a function g such that Vixlfix) 7^ g{x)\ < e. 



3.6 



taking K = f^(A2 + log(l/e)), 



CO 



Proof. The proof is essentially the same as the proof of Lemma 
where cq is as in Lemma 4.1 Note that in this case, if |/(a)| > 1 — e, then since |/(a)| = |/(— a)|, 
by Parseval's identity, if e < (1 — l/\/2) then this can only happen if a = 0, hence / is already 

Lemma 



highly biased. Furthermore, for a random x E Z" and fixed 7 G 



4.1 



implies a node 



labeled X7 is norm reducing (by an absolute constant cq/A) for x with probability 



b/3j 



> 1/5, 



hence a similar argument to the one used in Lemma |3.6| shows that a random input x arrives at an 
unbiased leaf with probability at most e. 

The bound on the tree size, which is min{p , 2 ^^ -* • K ^^ >^, also follows in the same way 



as in the proof of Lemma 3.6 using a similar recursion formula whose solution is similar to the 
formula on Theorem 14^ D 



Finally, we note that although this result is not stated in |KM93] . the algorithm from Lemma 3.7 
can be modified in the straightforward way to work equally well for functions / : Z" — )• {+1, — 1}, 
with virtually the same proof. 



Proof of Theorem \4-ll . We use the algorithm from Lemma 3.7 to find /'s largest Fourier coefficient 
in absolute value, /(a). Whenever |/(a)| < 1 — e, the same algorithm can be used to fin d the 
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second largest coefficient, /(/?), in polynomial time (in n, 1/e and \og{l/5)). We use Lemma 

to construct a functional ©p-DT, and replace every function-labeled leaf with the constant it is 

biased towards. 



We again mention, as in the proof of Theorem 1.4, that we do not need to calculate f{a) and 
/(/3) exactly, but only to within an error of, say, e/{2pA), which can be guaranteed (with high 
probability) by the algorithm of Lemma 3.7 D 



5 Conclusions and open problems 

In this work we obtained structural results for Boolean functions over Z" for prime p. Our results 
provide a more refined structure than the one given in the works of Green and Sanders |GS08a| 
IGSOSbj . For a certain range of parameters we also obtain improved results in the setting of the 
works 



[GSM^IGSnSbj . 

We were also able to achieve new results in the field of computational learning theory by showing 
that such functions can be learned with ©-DTs as the class of hypotheses. 

There are still many intriguing open problems related to the structure of Boolean functions 
with small spectral norm. Most of these are related to the tightness of our results (as well as to 
the tightness of the results of Green and Sanders |GS08aj ). 

We do not believe that the bound given in Equation ([2]) is tight. Perhaps it is even true that 
one could represent / as a sum of polynomially (in A) many characteristic functions of subspaces 
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(note that this is not true for functions over general abehan groups. See |GS08bj ). Similarly, we do 



not believe that the bounds we obtain in Theorems 1 1 . 2| and 4.8 are tight. It seems more reasonable 
to believe that the true bound should be poly(n,^). 

Recall that |ZS101 IMO09| conjectured that Boolean functions with sparse Fourier spectrum 



can be computed by a ®-DT of depth poly(logspar /). Theorems 1.3 and 4.10 give an affirmative 



answer only for the case that / also has a small spectral norm. Thus, the general case is still open. 



Finally, Theorems 1.4 and 4.11 give shallow ©p-DTs approximating functions with small spectral 



norm. These results too do not seem tight. In particular, it is interesting to understand whether 
something better can be obtained if we assume in addition that / can be computed exactly by a small 
©p-DT. Namely, can one output a shallow ©p-DT approximating / over the uniform distribution 
using polynomially many membership queries (i.e. oracle calls) to /, assuming that / can be exactly 
computed by such a ©p-DT (and has a small spectral norm). 
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A Proof of Lemma 2.5 



The proof of Lemma 2.5 relies upon the following even simpler lemma. 



Lemma A.l. Let y C Zg be an affine subspace of co-dimension k, and let ly '■ Zg — >■ {0, 1} be its 
characteristic function. Then spar(ly) = 2^ and ||lv||i = 1- 

Proof. Denote V = a + U where U is a subspace of co-dimension k. There are k vectors 71, . . . , 7^ S 
Z2 (a basis for U-^) and 61, . . . , 6^ G {+1, —1} such that 1^(2;) = 1 if and only if x^ni^) = bi for all 

1 < i < k. Therefore 

k 



iv-(x)=n(^^^) 



Using the relation XfSX'j = X/3+7! and the fact that span{7i, . . . ,7fc} = [/-*-, we get 

lv{x) = Y^ ±2-'xy{x). 
7ec/^ 

Since \U \ = 2 , both statements follow. D 



Proof of Lemma \2.5[ Let L be the set of leaves of T, and for every £ G L let 6^ be its label, and 
1^ : Z2 — )■ {0, 1} be the characteristic function of the set of inputs x such that computation upon 
X arrives at the leaf i. Since T computes /, we may represent / as: 

f = Ybele{x). 
23 



Now note that if ts depth is t, then 1^ is a characteristic function of an affine subspace of co- 



dimension t. The maximal depth of T is k, hence for every £ £ L we have, by Lemma A.l 
spar(l£) < 2^ and ||lf ||i = 1. Finally, since \L\ = m, we get 



spar(/) < ^spar(l<?) < ml'', 



and since \bi\ = 1, the triangle inequality implies 

<^Ui\\i <m. 



D 



B Proof of Lemma 4.3 



Proof. Suppose without the loss of generality (by applying a suitable rotation and reflection if 
needed) that zi = R is a positive real number, and that the angle is exactly 9 < tt (i.e. Z2 = re*^). 
Note that l^il + |z2| = i? + r and zi + Z2 = {R + r cos{6)) + irsm{6). Hence, 



\zi + Z2\ = ^J{R + r cos(6'))2 + (r sin(6'))2 = ^J R^ + r^ + 2Rr cos(6') . 
It remains to be shown that 



R + r- ^/ R^ + r"^ + 2Rr cos{e) > ^^^^^^r. 



This is equivalent to 



l-cos(0) 
R + r —^r 



R^ -r^ - 2Rr cos(6') > 0. 



Rearranging and factoring out r > 0, we get a linear function in r which is non- negative on both 
r = and r = R, which implies the inequality holds for all < r < R. 
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C Proof of Lemma 4.5 



Proof. It is enough to show that on every coset 7 G ^^/{rj): 



/p-i 

^■[^\fil + kv)\ 

\k=0 



p-1 
k=0 



UJ 



Xk 



(22) 



p-i 



> Y. \f^^ + kv)\ + 1/(7 + {k + im - 1/(7 + kv) + /(7 +{k + l)riW 



k=0 



def ; 



Fix a coset 7. For A: = 0, . . . ,p — 1 denote by Zk = f{'~^ + krj) ■ uj^'^ . Rewriting Equation (22) under 
this notation gives 



^p-i 



3- X^l^fc 



UJ 



-\k\ 



\k=0 



p-1 
fc=0 



/ fc=0 



^ l + l^fc+l'"^ \ — \Zk • 1^ +Zk+l-^ I 
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Since multiplying by uj does not change the norm, this is equivalent to 



/p-i 

\k=0 



p-1 
k=0 



p-1 



> X^ kfcl + kfc+il - \zk + zk+i\ . 



(23) 



fc=0 



We break the right hand side of Equation (23) into 3 sums: 



1- J2ke{0,2,...,p-3} \^k\ + \zk+i\ - \zk + ^fc+ll 

2- I]fee{i,3,...,p-2} kfcl + \zk+i\ - \zk + Zk+i\ 

3- J2ke{p-i} \^k\ + \zk+i\ — \z:k + Zk+i\ 

Each sum goes over a disjoint set of pairs {k,k + 1). Next, we show that each sum is at most 
Z]fc=o l-^fcl ~ X]fc=o '^'^ ' completing the proof. We claim in general that ifAC{0,...,p— l}isa 
subset such that 1 + A = {1 -\- a mod p : a G A} is disjoint of A, then 



p-1 



^ \Zk\ + l^fc+il - \Zk + Zk+l\ < ^ \Zk\ 



k&A 



k=0 



p-1 

Y^Zk 

fe=0 



Let B := {0,l,...,p-l}\{Au{l + A)), then Au{l + A)UB is a disjoint union of {0, . . . ,p - 1} 
and we have: 



^IZfcl + \zk+l\ - \Zk + Zk+l\ 



fceA 



y^ \zk\ + i^^fc+ii + ^ \zk\ - ^ kfci - y^ i^fc + 



Zk+l\ 



keA 



keB 



keB 



keA 



{Y\zk\+ ^ \zk\ + y^ \zk\ - y^ \zk\ - y^ kfc + zk+i\ 

VfeeA fcei+A fees / keB keA 



p-1 

<>:w- 

fc=0 


Yzk + Yzk + Zk+1 
keB keA 




p-1 

fc=0 


Y^zk + Y.zk+ Y. ^k 

keB keA kel+A 


p-1 

fe=0 


p-1 

>:* 

fe=0 


•) 





where in the inequality we used the triangle inequality. 
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